1. Technical Field
The disclosed embodiments relate to the writing of web search queries in sponsored search, and more particularly, to the online expansion of a rare query by correlating features of the rare query to features of expanded queries from web and search resources related to more popular queries.
2. Related Art
The explosive growth of the Internet as a publication and interactive communication platform has created an electronic environment that is changing the way business is transacted. As the Internet becomes increasingly accessible around the world, users need efficient tools to navigate the Internet and to find content available on various websites.
Search engines provide a gateway to the World Wide Web (“Web”) for most Internet users. They also support the Web ecosystem by providing much needed traffic to many websites. Each query submitted to a commercial search engine such as Yahoo! or Google results in two searches. The first search is over the corpus of web pages crawled by the search engine. The web crawl performed by the search engine can be viewed as a pull mechanism used to obtain documents. The second search is over the corpus of advertisements provided to the search engine through an interface or a feed from advertisers. This can be viewed as a search over pushed content.
The ad search provides traffic to (mostly) commercial websites that might otherwise not show up in the top web search results for the query. Since advertisers pay for the placement of their ads on the result page, the search of the ad space is commonly called sponsored search. Two main scenarios of sponsored search advertising are exact match, where advertisers specify the exact query (bid phrase) for which the ad is to be shown, and broad match where queries are matched against ads using a broader criterion. This typically includes matching the query against the ad text, target website (landing page), or other information related to the user, ad, or advertiser.
The volume distribution of web search queries follows the power law. That is, the most frequent queries compose the head and torso of the curve, while the low volume, rarer queries make up the tail of the curve. While individually rare, tail queries make up a significant portion of the query volume. For this reason, tail queries have significant potential for advertising revenue.
Web search engines return results for most queries, including those in the tail of the curve. This is not, however, the case for sponsored search. Evaluation of two major search engines has shown that only about 30%-40% of the query volume is covered by ad results. The main reason for this is that tail queries are harder to interpret. In most cases there are no ads that are explicitly associated with them by advertisers who specifically bid on the query. Furthermore, ad matching based on analyzing historical click data is also difficult, since due to the low volume it is harder to accumulate enough ad clicks to use statistical and explore-exploit methods to identify good ads. Search engines normally avoid displaying irrelevant ads in order not to degrade user experience so the current practice is not to advertise on most of the tail queries, which means failure to capitalize on advertising dollars for at least half of all search queries.